Standard Arabic formalization and linguistic platform for its analysis

نویسنده

  • Slim MESFAR
چکیده

From the beginning of the sixties, and starting with the first automatic analyzer proposed by David Cohen, one of the first theorists of NLP [1], research has continued with natural language processing and especially the automatic treatment of the Arabic language. In 1983, with a minimalist morphological analysis, based on the theory that any Arabic form is generated using root and pattern, researchers developed the first twolevel morphological analyzer for Arabic (Koskenniemi 1983); this work was included within the project ALPNET (Beesley and Buckwalter 1989) using finite-state technology allowing only the concatenation of morphemes in the morphotactics. Since 1996, the Xerox research centre has enhanced this system using an algorithm of automatic combination between roots and patterns generating stems; this research is based on the ALPNET’s dictionaries which were, considerably rebuilt using the Xerox finite-state technology (Beesley 2001). This technology is computationally very efficient for natural-language-processing; it’s used within the developmental environment NooJ (Silberztein 2006). The use of finite-state machines within NooJ was extremely attractive, they are used to generate and analyse several thousands of words per second. This linguistic platform will be described inside this paper as the tool used for vocabulary formalization and analysis of standard Arabic language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Antelop: an industrial platform for linguistic processing

The Antelope linguistic platform, inspired by Meaning-Text Theory, targets the syntactic and semantic analysis of texts, and can handle large corpora. Antelope integrates several pre-existing (parsing) components as well as broad-coverage linguistic data originating from various sources. Efforts towards integration of all components nonetheless make for a homogeneous platform. Our direct contri...

متن کامل

On the Optical Character Recognition and Machine Translation Technology in Arabic: Problems and Solutions

The report addresses the basic problems of the Arabic language formalization based on analysis of linguistic errors in software products. Reviewing the principles of modern information systems operation the authors come to the conclusion that the existing methods of the Arabic formalization allow to note a shift towards the technological aspects of the linguistic processing of facts, however, t...

متن کامل

Bourdieu and Genette in Paratext: How Sociology Counts in Linguistic Reasoning

While Bourdieu’s theory of practice provides an ensemble of conceptual tools which analyze patterns of social life that are irreducible to the limiting view of individuals as free-acting agents, Genette’s paratextual theory offers the metalanguage necessary to account for the microcosm of paratext as a linguistic space. This study takes issue with unidirectional approaches to researching parate...

متن کامل

Revisiting the Arabic Diglossic Situation and Highlighting the Socio-Cultural Factors Shaping Language Use in Light of Auer’s (2005) Model

In the field of Arabic sociolinguistics, diglossia has been an interesting linguistic inquiry since it was first discussed by Ferguson in 1959. Since then, diglossia has been discussed, expanded, and revisited by Badawi (1973), Hudson (2002), and Albirini (2016) among others. While the discussion of the Arabic diglossic situation highlights the existence of two separate codes (High and Lo...

متن کامل

LAMP: A Multimodal Web Platform for Collaborative Linguistic Analysis

This paper describes the underlying software platform used to develop and publish annotations for the Quranic Arabic Corpus (QAC). The QAC (Dukes, Atwell and Habash, 2011) is a multimodal language resource that integrates deep tagging, interlinear translation, multiple speech recordings, visualization and collaborative analysis for the Classical Arabic language of the Quran. Available online at...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007